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Abstract: Almost sure bounds are established on the uniform error of smooth- 
ing spline estimators in nonparametric regression with random designs. Some 
results of Einmahl and Mason (2005) are used to derive uniform error bounds 
for the approximation of the spline smoother by an "equivalent" reproducing 
kernel regression estimator, as well as for proving uniform error bounds on 
the reproducing kernel regression estimator itself, uniformly in the smoothing 
parameter over a wide range. This admits data-driven choices of the smoothing 
parameter. 



1. Introduction 

In this paper, we study uniform error bounds for the smoothing spline estimator of 
arbitrary order for a nonparametric regression problem. In effect, we approximate 
the smoothing spline by a kernel-like estimator, and give sharp bounds on the 
approximation error under very mild conditions on the nonparametric regression 
problem, as well as on the uniform error on the kernel-like estimator. An application 
to obtaining confidence bands is pointed out. 

Let (X 1 ,Y 1 ), (X 2 , Y 2 ), ■ ■ ■ , (X n , Y n ) be a random sample of the bivariate random 
variable (X, Y) with X £ [ , 1 ] , almost surely. Assume that 

(1.1) f a (x) = E[Y \X = x] 
exists, and that for some natural number m, 

(1.2) / o e^ m '°°(0, l), 

where for a < b, the Sobolev spaces W m ' p (a, b), 1 < p < oo, are defined as 

j{in-i) continuous | 



(1.3) W m ^{a.b) = | / e C m - X [a, b] 

see, e.g., @- 

Regarding the design, assume that 



€ L p (a,b) 



X 11 X 2 , . • • , X n are independent and identically distributed, 
(1.4) having a probability density function w with respect to 

Lebesgue measure on ( , 1 ) , 
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and that 

(1.5) w 1 <w(t)<w 2 for all t € [0 , 1], 

for positive constants w 1 and w 2 - 

With the random variable (X, Y) , associate the noise D by 

(1-6) D = Y-f (X), 

and define D i =Y i — / G (X i ), i = 1, 2, . . . , n. Assume that 

(1.7) sup E[|Z?| K | X = x] < oo for some re > 2. 

xe[o ,1] 

(With the assumption (|1.2p . this is equivalent to sup x E[ | Y" | K | X = x] <oo.) 



Under the above conditions, uniform error bounds for the Nadaraya- Watson estim- 
ator have been established by Deheuvels and Mason [8j for a random choice of the 
smoothing parameter, and by Einmahl and Mason [12j] uniformly in the smoothing 
parameter over a wide range. We recall that the Nadaraya- Watson estimator is 
defined as 

n n 
i=l ' i=l 

where, K h (t) = h~ 1 K(h~ 1 t) for some nice "kernel" K. In this case, f n (x) is 
an estimator of f a (x) = K[Y \ X = x]. For some earlier results on uniform error 
bounds for Nadaraya- Watson estimators, see, e.g., [l(| and [l5j ]. 

For the smoothing spline estimator, we must come to terms with the fact that 
the estimator is defined implicitly as the solution, denoted by / = / , of a mini- 
mization problem, 

n 

minimize IS(/) = £ £ I /(*,) - Y t | 2 + h 2m \\ fW || 2 

(1.8) i=i 

subject to / G W m ' 2 ( 0, 1), 

where || • || denotes the L 2 (0, 1) norm. Thus, the bulk of the paper is devoted to 
establishing that for all t G [ , 1 ] , 

(1.9) f nh (t ) - e[ f nh ( t)\ x 1 ,...,x n ) = ± jr a^Kw^, + ^(0, 

i=l 

where || e nh is negligible compared to the leading term in (|1.9[) . and 9l wmh is 
the Green's function for a suitable boundary value problem, see (|2.13|) . Here, || • 
denotes the L°°( , 1 ) norm. The approach taken follows Eggermont and LaRiccia 

The precise results are as follows. For 7 > 0, define the intervals 



(1.10) njri) 



, \ogn ,i-2/„ j_ 



a„(7) 



, logn > 1-2/ a 1 



where A is unspecified but satisfies 2 < A < min( re, 4 ). 
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Theorem 1. Under the assumptions (| 1 .4|) - (| 1 . T[) on the model the error 

term e nh in (jl.9p satisfies almost surely, 

rp I \ dcf ,. || HoO , 

r £/E(7) = limsup sup . — - — <oo. 

n^co hen n h) h (nh) { log(l//i) V log log n} 



The uniform-in-bandwidth character of this theorem (which admits random 
choices of the smoothing parameter) stands out. Regarding the actual error bound, 
if h e G n ("f), then h InT 1 logn) 1 / 2 and the error term in (|1.9j) can be ignored. 
Note that for m > 2 and k > 3, this covers the optimal h, which behaves like 
(n _1 logn) 1 /' 2m+1 '. The theorem makes the smoothing spline much more accessi- 
ble as an object of study. Here, we consider uniform error bounds on the estimator. 
For cubic smoothing splines in a somewhat different setting, uniform error bounds 
were derived by Chiang, Rice and Wu [5|. 

Main Theorem. Assume the conditions f|l .2|) through (|1.7|) on the model (|1.1| . 
Then, the spline estimator of order m satisfies almost surely, 

II fnh _ f || 

Q ( 7 ) d = f limsup sup " J Jo "°° < oo. 

rwoc heg a (-y) V /i 2m + (nh)' 1 {log(l//i) V loglogn} 

The constant Q^g depends on the unknown regression function f Q through the 
bias. If we restrict h such that h -C (n^ 1 logn) 1 /^ 2m+1 \ then this dependence 
disappears, e.g., if for m > 2 and k > 2 + (1/m), we let 

(1.11) T n ( n )= [ 7 (n- 1 logn) 1 - 2/K ,n- 1 /( 2 ^ 
then 

II f™' 1 - f II 

(1.12) & UE h) = limsup sup UJ JoKco < oo, 

n^oo heF n {y) \J (nh)- 1 {log(l//i) V loglogn} 

and does not depend on f Q . This has obvious consequences for the construc- 

tion of confidence bands. Since it seems reasonable that the value of Q.u E can be 
determined via bootstrap techniques, then almost sure confidence bands in the spirit 
of Deheuvels and Mason Q and the Bonferroni bounds of Eubank and Speckman 



14| may be obtained. The full import of this will be explored elsewhere. 



2. The smoothing spline estimator 

Let m G N and ft, > be fixed. The smoothing spline estimator, denoted by / , is 
defined as the solution of the minimization problem (|1.8|) . The problem (|1.8[) always 
has solutions, and for n > m, the solution is unique, almost surely. For more on 
spline smoothing, see, e.g., [3] or [2I ]. 

A closer look at the spline smoothing problem reveals that f(XA is well-defined 
for any / £ W m ' 2 ( , 1 ). In particular, there exists a constant c such that for all 
/ S W m > 2 (0, 1) and all x G [0, 1], 

(2.1) |/WI<c{||/|| 2 + ||/( m )|| 2 } 1/2 , 
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see, e.g., Q. Then, a simple scaling argument shows that there exists a constant 
c m such that for all < h < 1, all / G W m ' 2 (0, 1), and all t £ [0, 1], 

(2-2) <c m /*- 1/2 || JUL- 

Here, 

(2-3) ll/IU= {ll/l| 2 + ft 2m ||/ (m) H 2 } 1/2 - 

Of course, the inequality (|2.2p is geared towards the uniform design. For the present, 
"arbitrary" design, it is more appropriate to consider the inner products 

(2-4) (f,g) =(f,g) + h 2m ( / (m) , <? M ) 

where ( ■ , ■) is the usual £ 2 (0, 1) inner product and 

i 2 (0,l) 

(2.5) (/,<?) - / f(t)g(t)w(t)dt. 

L 2 ((0,l),w) Jo 

1 /2 

The norms are then defined by || / \\ wmh = { (/, //^^ } • With the design 
density being bounded and bounded away from zero, see (|1.5p . it is obvious that 
the norms || • \\ mh and || • \\ wmh are equivalent, uniformly in h. In particular, 
with the constants w 1 and w 2 as in (| 1 . 5[> . for all / G W m ' 2 (0 , 1 ), 

(2-6) w l || / \\ mh < || / \\ wmh < w 2 || / || mh . 

(Note that, actually, < 1 < w 2 •) Then, the analogue of (|2.3p holds: There exists 
a constant c m such that for all < h < 1, all / G W m ' 2 ( , 1 ), and all t G [0 , 1], 

(2-7) \f(t)\<c m h-^ 2 \\f\\ wmh . 

For later use, we quote the following multiplication result which follows readily with 
Cauchy-Schwarz : There exists a constant c such that for all / and g £ W 1,2 ( , 1 ), 

(2-8) \\fg\\ , +MI (/<?)'ll , <c||/|| |U|| 

Also, there exist constants c k k+1 such that for all / G W /fc+1,2 ( , 1 ), 



(2-9) ll/L^< c fe,fe+ill/H 



w,k+l,h ■ 



The inequality (|2.7[) says that the linear functional / /(O are continuous 
in the || • || wmh -topology, so that W m ' 2 ( , 1 ) with the inner product ( • , • ) wrnh is 
a reproducing kernel Hilbert space, see Q. Thus, by the Riesz-Fischer theorem on 
the representation of bounded linear functionals on Hilbert space, for each t , there 
exists an element fR wmht £ W rn ' 2 ( , 1 ) such that for all / G W m ' 2 ( , 1 ), 

(2-10) f(t) = (f,K wmht ) wmh . 

Applying this to m. wmht itself gives 9t wmW (s) = ( V\ wrnht , ^mhs ) wmh , so that it 
makes sense to define 

(2-11) K wmh (t,s)=yi wmht (s)=<R wmhs (t) for alls, t £ [0, 1]. 
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Then, again the inequality (|2.7[) implies that 

(2-12) || 9i wm h(t i ■ ) Wwrnh — c m ^ ^ ') 

with the same constant c m . 

Finally, we observe that reproducing kernels may be interpreted as the Green's 
functions for appropriate boundary value problems, see, e.g., |9]. In the present 
case, ^ wm h(t, s) is the Green's function for 

(-h 2 ) m u {2m ^ +wu = v on (0,1), 

( 2 - 13 ) 

U W(0) = = 0, k = m,...,2m-l. 

In case w(t ) = 1 for all t (the uniform density), we denote ^ wmh by TZ mh . 

We finish this section by showing that the little information we have on the 
reproducing kernels suffices to prove some useful bounds on random sums of the 
form 

m 

JEW. 

with D 1 ,D 2 ,...,D n and X 1 ,X 2 ,...,X n as in Section 1, and / <E W m ' 2 (Q, 1) 
random, i.e., depending on the D i and X t . To obtain these bounds, let 

n 

(2.14) ejl^i^^JU^*). te[o,i]. 

This is a reproducing-kernel regression estimator for pure noise data. 
Lemma 1. For every f £ W m,2 ( , 1 ) , random or not, 

m 

| i V DjffxA I < ll/ll , ||e„ fi II 

and under the assumptions (|1.4p . (II. 5|) cmc? (|1.7p . i/iere exists a constant c m not 
depending on h such that E[ || & nh Wwmhl — c m {nh)^ 1 . 

Proof. The identity i £?=i L> 2 /(ZJ = ( / , 6 n/l ) iom , i implies the first bound by 
way of Cauchy-Schwarz. For the expectation, we have 

^[D t ft wmh (X l7 t)] =E[E[D i \X i ]X wmh (X it t)] = 

and so, since D i 9\ wmh (X i , t), i = 1,2, ...,n, are independent and identically 
distributed (iid), it follows that 

n 

E[||S n J| 2 ] = n~ 2 H D f II ^wmh( X i > ' ) II , J 

L ((0,l),tu) i=1 L ((0,1), to) 

, — 1 71 r Ttr r 1 1 co ( v \ 1 1 2 1 



< n- 1 ME[||$H u)mft (X, -)|| 



where 



(2.15) M= sup E[D 2 \X = x] 

xe[o,i] 
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By (fL~7|) . we have M < oo. 

Similarly, since D i ^^PQ, i )> « = 1, 2, . . . , n, are iid, then 

EMie^ll 2 1 <n~ 1 M E[ II 5R (m) ,(X, Oil 2 1. 

Lll nft H L 2 (01) J - Lll wm fcV ! ; "l 2 (0,1) J 

It follows that 

E[\\e nh \\* mh ] ^n- 1 ME[\\m wmh (X, -)\\* mh ]. 
Now, (|2.12|) takes care of the last norm. □ 



3. Random sums 

In this section, we discuss sharp bounds on the "random sums" & nh of (|2.14|) . 
using results of Einmahl and Mason [13] regarding convolution-kernel estimators 
(in a more general setting). Thus, let 

(3.1) J ftTe J L 1 (M)ni°°(M), \ K(x)dx=l. 

We also need some restrictions on the "size" of the set of functions on [ , 1 ] , 

(3.2) JC = {K(hT l {x- •)) | ie[0,l], 0</i<l}. 
First, we need to assume that 

(3.3) JC is pointwise measurable. 

For the definition of pointwise measurability, see van der Vaart and Wellner [23l | . 

Let Q be a probability measure on ([0, 1],B), and let | • ||q denote the L 2 (Q) 
metric. For e > 0, let Af(e,IC, \\ ■ \\q) denote the smallest number of balls in the 
|| • || q metric needed to cover /C, i.e., 



(3.4) Af(e,IC,\\ ■ \\q) = min neN 



mm \\k-g t \\ Q <e 

Ki<n 



Then, let 

(3.5) Ar(E,K)=saptf(e,K,\\-\\ Q ), 

where the supremum is over all probability measures Q on ([0 , 1 

The restriction on the size of fC now takes the form that there exist positive 
constants C and v such that 

(3.6) M(eX) < Ce~ v , < e < 1. 

Nolan and Pollard pj}, see also [2^1, show that the condition (|3.6|) holds if the 
kernel K satisfies (|3.1|) and (|3 . 3[1 . and has bounded variation, 



(3.7) K e BV(R). 

Whenever K has left and right limits everywhere (so in particular, when (13.7 
holds), then (|3.3|) holds also. 
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The object of study is the following kernel "estimator" with "pure noise" data, 

n 

(3.8) S nh (i) = i £ D i K h( X z ~ *)i *6[0,1]. 



We quote the following slight modification as it applies to (|3.8|) of Proposition 2 of 
Einmahl and Mason j 1 21 ] without proof. The modification involves the omission of 
the condition of compact support of the kernel K, which is permissible since the 
design is contained in a compact set, to wit the interval [0,1], 17f. Recall the 
definition of H n (j) from (fTTT0| . 



Proposition 1 (after Einmahl and Mason [12]]). Under the assumptions (|3.ip . 
(I3~3jl . (ETBl) . (f877T> . and fO]l . (Oil , and (fOjl . /or every 7 > 0, 

II § I 

limsup sup — nh °° < 00 almost surely. 

rwoo hen^-y) J (nh)- 1 {log(l//i) Vloglogn} 

Proof. The proof needs updating in only one spot, viz. the bound (3.20) of the 
Einmahl and Mason (12] paper needs to be established under the present conditions. 
However, that just amounts to showing that 

sup sup hE[\DK h (t -X)\ 2 } < 00. 
Q<h<l te[o,i] 

Observe that 

E[\DK h {t-X)\ 2 ]=E[E[D 2 \X]\K h {t-X)\ 2 ]<ME[\K h {t-X)\ 2 ], 
with M as in |2~T5| . Now, 

1 



E[\K h (t -X)\ 2 ] = / h- 2 K 2 (h- 1 (t -x))w(x)dx 
Jo 

<w 2 h~ 1 ( K 2 (x)dx <w 2 \\K\\ , || if || , 
~ 2 J M w - 2 11 n il(R) 11 ii i00(] 



for a suitable constant c, not depending 011 t. □ 

Now, we have the task of relating the random sums involving the reproducing 
kernels to sums involving convolution kernels. Obviously, some convolution-kernel- 
like properties of the reproducing kernel are required. 

Definition 1. We say a family A h , < h < 1, defined on [0, 1] x [0, 1], is 
convolution-like if it satisfies the following conditions : There exists a constant c 
such that for all t € [0 , 1 ] and all h, < h < 1, 

\\A h (-,t)\\ <c, \\A h (t, OIL <ch-\ \A h (-,t)\ BV <ch-\ 

Here, | / \ BV denotes the total variation of the function / over [0 , 1 ]. 

The families h e f , s ), < h< 1, £ = 0, 1, . . . , m, are indeed convolution- 

like, as shown in ll|. Here, 

(3-9) ^i h (t,s) = -^ I m wmh (t,s) 
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denotes the £-th order derivative of 9$ wmh (t,s) with respect to s (or by symmetry, 
with respect to t ) . This result is in the style of results on the "equivalent" kernel 
for spline smoothing, except that that the kernel is not a convolution kernel and 
that it handles arbitrary design densities subject to the condition (11. 5|) and treats 
the boundary conditions in (|2.13J exactly. The relevant references on equivalent 
kernels for spline smoothing are Q, S 0, E3, HES E3 • 



Now, there is an interesting way of connecting the reproducing kernel sum 6 nh 
to a sum S nh for an appropriate kernel K. Define 

, 310 * 3(a;)=exp(-s)i(x>0), 

g h (x) = h^g^^x), xeR. 

One verifies that h g h is the fundamental solution for the initial value problem 

h u + u = v on ( , 1 ) , 

(3.H) ... 

u(u) = a, 

i.e., for 1 < p < oo and v 6 L p (0 , 1 ), the solution u of the initial value problem 
(|3.11|) satisfies u € L p ( , 1 ), and is given by 

(3.12) u(x) = hg h (x)u{0) + g h (x - z)v{z) dz, i6[0,l], 

Jo 

see, e.g., Section 2.1, formulas (10) through (14). Note that the last integral is 
really only over the interval [0 , x] . Since v = hu' + u, this leads to the integral 
representation of the function u, 



(3.13) u[x) =hg h (x)u(0)+ g h (x - z) {hu'(z) + u(z)} dz, xe[0,l}. 

Jo 

Now, one verifies that 

(3.14) the kernel g satisfies (gUJ), (O, (j3~7)) . 
so that the class Y generated by g, 

(3.15) T = {g(h-\x- •)) | x e [0, 1], < h< l}. 
satisfies p.6[) . i.e., 

(3.16) 7V(e, T) < Ce~ v , < e < 1. 
Thus, Proposition [1] would apply to the random sum 

n 

(3.17) s nh (z) = ±Y i D i g h (X i -z), z e [0 , 1], 

i=l 

but first we connect the sums & nh and s nh . 

Lemma 2. Assume that the functions Ah, < h < 1, are convolution-like in the 
sense of Definitional^ Then, there exists a constant c such that for all h, < h < 1, 
Z?]^ , _D 2 , . . . , D n € R, and aH positive X 1 , X 2 , . . . , X n G [ , 1 ] , 

n n 

\\n^2 D i M*i> ■ ) L < c II i E a ^(^ -olL- 

i=l (=1 



228 P. P. B. Eggermont and V. N. LaRiccia 

Proof. Assume that A h is diffcrcntiablc with respect to its first argument. Then, 

\M-^)\bv = \\ A L(-,t)\\v(a,i), 

where the prime / denotes differentiation with respect to the first argument. Now, 
apply (|3 . 1 3|) to the function u = A h ( ■ , t ) (for fixed t ), so for all x, 

(3.18) A h (x,t) = hg h (x)A h (0,t)+ [ g h (x - z) {hA' h {z, t) + A h (z, t)} dz. 

Jo 

Next, take x = X i and substitute this into 

n 

Snh(t) = ^ D i A h(Xi,t), t e [0, l]. 
i=i 

Then, we have 

(3.19) S nh (t) = hA h (0,t)s nh (0)+ [ {hA; i (z,t) + A h (z,t)} s nh (z)dz. 

Jo 

Now, straightforward bounding gives 

|| ^L<c| ^l+Qll^L, 

where C = h || A h (0, • ) ^ and 

Ci = sup \\A h (t,-)\\ +h \A h (t,-)\ 

te[0,l] L f ' 1 ) BV 

So, by the convolution-like properties of A h , the constants C and C 1 are bounded, 
uniformly in h. Also, since all of the X i are positive, then 

lim s nh (z) = s nh {0), 

and so, for C 2 = C + C lt we have || S nh |L < C 2 || s nh 1^. 

The extension to the case where A h is not necessarily differentiable with respect 
to its first argument follows readily. □ 

Since the families \i ^I2nh( * > s )' < h < 1, £ = 0, 1, . . . , m, are convolution- 
like in the sense of Definition [U we may apply the lemma to the sum & nh of (|2.14ll 
and its derivatives. This yields 

(3.20) || 6$ Lowells'* to, *=0,l,...,m. 

Now, for the model (jl.ip through (|1.7|) . the sum s nh of (|3.17|) may be treated by 
the above formulated Proposition [T] This proves the following result. (Recall the 
definition (TTTOf of H n {~/).) 

Theorem 2. Under the assumptions (|1.4[) , (1 1 . 5[) and (| 1 . 7[) . for 7 > 0, and for 
I = 0, 1, . . . , m, 

QaoiKl) = hmsup sup - — < 00, 

n->oo hew„(7) v (n/i) -1 {log(l//i) V log logn} 



almost surely. 
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It turns out that we need a similar result for the || • \\ wmh norm, which requires 
a result for the L 2 norm. The following is good enough for our purposes. Obviously, 
with denoting the m-th order derivative of & nh , we have 

II ®nh ll,L 2 ((0,l),tu) - C H ®nh Wool II &nh Hl 2 (0 : 1) - H ®nfr Hoc' 

with c = y / w^2, and then Theorem^ gives useful bounds for the || • \\ wmh norm. 
Corollary 3. Under the conditions of Theorem!^ we have almost surely, 

Q wm ^ limSUp SUp W & nh\\ W mh == < ^ 

n^oo heH n (-y) y (nh) 1 {log(l//i) V loglogn} 
4. The design sums 

The reproducing kernel Hilbert space set-up is also useful for connecting random 
design sums 

n 

^Ei/^i 2 

to their (partial?) expectations 

/ \f(x)\ 2 w(x)dx, 
Jo 

for random functions /. In particular, we prove the following almost sure result. The 
range of the smoothing parameter is much larger here than it was before, although 
we only need it for h £ H n ("f), see (I1.10|) . Here, let 

(4.1) D„( 7 )= [ 7 n-Mogn, i]. 

Theorem 4. Under the assumptions (|1.4[) and (|1.5p . for all f 6 W m ' 2 ( , 1 ), 

n 

k I K X i) I' + ^ I' f (m) I'' ^ T nh II / Wwmh, 
3=1 

where liminf inf r nh = 1 almost surely. 

To prove this, let W be the (cumulative) distribution function corresponding 
to the design density w and let W n be the empirical distribution function of the 
design X l , X 2 , ■ ■ ■ , X n , and introduce the "design sums" 

n 

(4.2) w nh (t)=g h *dW n (t) d ^ ^g^-t), *e[0,l], 

i=l 

which is a canvoiution-kernel density estimator and its expectation, 

(4.3) E[w nh (t)]=g h *dW(t)= [ g h ( T - t)w(r)dT, t G [0, 1]. 

Jo 

We will use Theorem 1 of Einmahl and Mason [lj|, quoted here for convenience. 
(This time, no modifications are necessary.) 



230 P. P. B. Eggermont and V. N. LaRiccia 

Proposition 2 ([13]). Under the assumptions (|3.ip . (|3.3[) , (|3.6p , and (|3 . T[) . and 
(|1.4p and (|1.5|) . /or ewn/ 7 > 0, 

||ui Tlfc -E[ii; n,l ]|| QO 
limsup sup — — < 00 almost surely. 

n^oo heVJ-y) J n h ( log(l//l) V log log 71 ) 

To prove Theorem 01 we start with simple "design sums" . 
Lemma 3. Under the assumptions (II. 4|) and (|1.5p . /or a// / S , 1 ), 

_1 /(t) { cW^(t) - dWr(t) } I < II /ll £ x ((0>1) , w) +*H /'H^o^j > 

where limsup sup — ^ nh < 00 

n-»°° ft £ p n ( 7 ) / ( n /j)-i {log(l//i) Vloglogn} 

almost surely . 

Proof. With the reproducing kernel Hilbert space trick, 

/(*) = </, > 
we obtain by linearity and Fubini's theorem that 



f(t){dW n (t)-dW(t)} = (f,S nh ) wlh , 
where 

S nh (s)= [ X wA>h (t,s){dW n (t)-dW(t)}, se[0,l] 



is the variance part of the pointwise error of a reproducing-kernel estimator of the 
design density w . Now, straightforward bounding gives 

{/'.(O'^sll/'II.^^IKO'IL, 

so that 

>«.X. fc < { II / ll^ ((0 ,X J,«> + fc II «^ (0i 13 } i II ll« H" ^ II C^)' lloa >, 

with, explicitly, 

n 

11 ^ Hoc = II i E • ) - E [9W,fc( x i . • ) ] L> 

i=l 
n 

h 11 (^y il = il £ E ^.i,^ ■ ) -hnK,i,h( x i . OIL- 

i=l 

Both of these may be interpreted as the variance part of the uniform error of 
(reproducing) kernel estimators. As already noted, the families ^ wmh (t , s) and 
h y\' w j h ( t , s ) are convolution-like in the sense of Definition [TJ Then, by an appeal 
to Lemma [5] with D i — 1 for all i, 

\\S nh \\ 00 <C\\g h *{dW n -dW}\\ 00 , 
H^Y Woo <C 1 \\g h *{dW n -dW}\\ oa , 
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II 



for suitable constants C and C 1 . Now, an appeal to Theorem 1 of Einmahl and 
Mason [T^, see Proposition [2] above, clinches the deal. □ 

We now get the following lemma, which immediately implies Theorem^] 

Lemma 4. Under the assumptions (II. 4[) and (|1.5[l . for all f,g£ W m ' 2 ( , 1 ), 

1 /( t ) g( t) { dW n ( t ) - dW( t) } | < Vnh || / \\ wrnh || .g |L mh) 

where limsup sup — ^ nh < oo 

hev n ( 7 ) J (ph)- 1 { log(l/h) V log log n } 

almost surely. 

Proof. From Lemma [31 we get the bound 
with the requisite behavior of £ . Now, from 



-L ((0,1 (0,1) 

for an appropriate constant c . Finally, (|2.9p gives || / x h < c\\ f \\ w m h , again 

for an appropriate constant c, and likewise for g. Thus, r\ n h satisfies r\ nh < cC, nh , 

and the lemma follows. □ 



5. L 2 error bounds 

We are now ready to prove almost sure bounds on || f nh — f Q \\^ mh for the spline 
smoother f nh . The starting point is the quadratic Taylor expansion of the objective 
function LS(/) of (|1.8p around its minimizcr. Let 

(5-1) e = f nh - f . 

Since the Gateaux variation of LS at its minimizer vanishes, this gives 



(5.2) i Y, I I 2 + h 2m || £ (m) || 2 = IS( f a ) - LS( f nh ). 

i=l 

Now, again, simple quadratic Taylor expansion around f a gives 

n n 

LS( f a ) - LS( f " ) = - i 2 I I' + 5 E A 
( 5 - 3 ) i=i i=i 

- /i 2m || || 2 + 2 fr 2m ( /< w > , e< m > ), 
and so, after substitution into ([57 



(5.4) i £ | e pQ | 2 + h 2m || e <-> || 2 = i £ ^ e(JQ + /* 2m ( , ). 

i=l i=l 

This is similar to the development in [22I ]. 



232 P. P. B. Eggermont and V. N. LaRiccia 

Now, with LemmaHl Theorem[4j and Cauchy-Schwarz, one obtains 
(5-5) r nh || e \\ l mh < \\ e \\ wmh { \\ & nh \\ wmh + h m || /M || }, 

where we took the liberty of using h m \\ eW || < || e \\ wmh . It follows that 
(5-6) r nh \\s\\ wmh < \\6 nh \\ wmh + h m \\f^l 

and the following result emerges. 

Theorem 5. For the model under the assumptions (|1.4j) . (|1.6| . (|1.2jl . and 

(II. 5p . wif/i Ti-nil) defined in (|1 . 10[> . /or 7 > 0, almost surely, 

II f"' 1 — f I 

limsup sup y ^H^mfe < ^ 

n^oo h G W n ( T ) V^- 2 ™ + HO -1 {log(V^) V log log n} 

and /or h x (n. -1 logn) 1 /( 2m+1 - ) (deterministic or random), 

II " fo Lmh = 0{ (n^ 1 logn) m /( 2m+1 ) ) almost roreZy. 

Proof. This follows from (|5.6[) and Corollary^ □ 

The error bound (|5.6p appears to be quite sharp. In the next section, we show 
that f nh { t ) - E[ /"' l ( i ) I X^...,X n ] « S n/l in a precise, useful sense. 

6. C-splines 

In this section, we determine a useful, accurate expression for the variance part 
f nh {t) -E[f nh (t) I X u ...,X n ] of the pointwise error f nh {t) - f a (t), with an 
eye towards almost sure uniform error bounds. Since the estimator f nh is linear in 
the data, one sees that 

(6.1) ^ h = f nh -E[f nh \X 1 ,...,X n } 

is the solution to the "pure noise" problem 

n 

minimize ± ]T | /(JQ - A | 2 + n 2m || / (m) || 2 
("•A) f=i 

subject to /eW m ' 2 (0,l). 

In fact, we show that ip nh (t) w ip nh (t), where / = V'™' 1 solves the C(ontinuous)- 
spline problem 

n 

minimize II / II 2 -iVfl, /(X) + n 2m II f^ || 2 

(6.3) "l 2 ((o,i),») « ^ wv 47 11 

subject to /£ W^ m ' 2 (0, 1). 

By the interpretation of the reproducing kernel 9l wmh as the Green's function for 
the boundary value problem (|2.13p . one observe that ip nh is given by 

n 

(6.4) r h (t) = ±Y, D i^mh(Xi,t), 

so that the following almost sure error bounds apply. 
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Theorem 6. Under the assumptions of Theorem [5l almost surely, uniformly in 
h € Hjj), see (TTD]), 

II V nk - r h \\ wmh = 0{h-^ (nh)- 1 {log(l/h) V loglogn} ), 
II V nh ~ r h Hoc = 0{h- x (nh)- 1 { \og(l/h) V log log n} ). 

Proof. Let e = ip nh — %jj nh . Similar to the inequality (|5.4p . one obtains quadratic 
inequalities for the discrete and the continuous spline problems. Adding these gives 

(6.5) || e f + 2/l 2m|| e ( m )||2 + I ^ | £(Xi )| 2 =rhs, 



where 

r 



hs= I' 9 (t){dW n (t)-dW(t)}, 



with g = | ip nh | 2 - | i(j nh | 2 = ( + ip nh ) e, and e as above. 

Now let 7 > be fixed. Then, the following statements hold uniformly in h G 
Ti. n ("f)- Using Lemma [31 one obtains, almost surely 



rhs = O( v /(n/ l )-i{log(l// l )Vloglogn}){|| 5 || Ll((0 i)w) +/i|| 3 || Ll(o i} } 

= o( v^-Mioga/^viogiogn} ) ii s \\ wmh ii <p nh + r h \\ wmh , 



where we used the multiplication result (|2.8p . and (|2.9p . Substituting this into (|6.5 
one obtains almost surely, 



II e \\ wmh = 0( ^/(nh)^{\og(l/h)\/\og\ogn} ) || V nh + ^ h \\ wmh - 

Now, 

II f nH + r h \Lmh < \\<p nh \Lmh + II Lmh, 

and consequently, by Theorem [5] applied with f a = 0, and ()6.5|> . 



llv B& + ^ nh |Lm/i = °( V^M-Mloga/MVloglogn} ) almost surely. 
Thus, 

II £ II = 0{ (nhy 1 { log(l//i) V loglogn} ), 
and so, at the loss of a factor /i" 1 / 2 , 

|| e IU = 0( h-^^nh)- 1 { \og(l/h) V log log n} ) . 
The theorem has been proved. □ 
The above completes the proof of Theorem [TJ 

7. C-splines: the Bias 

In considering the bias of the estimator f nh , note that f h — K[f nh \ X x , . . . , X n ] 
is the solution of (|1.8p with D i = 0, i = 1, 2, . . . , n, i.e., the solution of the discrete 
noiseless problem 

n 

minimize DN(f) HZ 1 £ | /(XJ - / (XJ | 2 + /> 2m || /( m > || 2 
('•-'-J i=i 
subject to / e W m > 2 (0, 1). 
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Note that the randomness in f h is due to the randomness of the design. We must 
compare f h to f a , but it is easier to first compare f h to ip h , the solution of the 
continuous noiseless problem 

minimize CN(f) d = f || / - f a || 2 + h 2 ™ \\ || 2 

(7.2) L C(o,i),td) 

subject to / e VT™' 2 (0, 1). 

In this section we prove the following theorem on the conditional bias. Note the 
"restricted" set Q n {^) of allowable values of h. 

Theorem 7. Under the assumptions of Theorem® almost surely, 

limsup sup Wfh-foWco <QQ . 

n^oo heg n {i) V h 2m + (nhy 1 {\og(l/h) V loglogn} 

The proof goes again by way of the reproducing kernel approximation, and fol- 
lows without further ado from the following two lemmas. 

Lemma 5. Under the assumptions of Theorem® almost surely, 

limsup sup — - W'Ph-fhWoo <oQ 

rwoo ftGH n ( 7 ) h' 1 ' 2 {h 2m + {nhy 1 {log(l//i) Vloglogn}} 

Proof. Let e = ip h — f h . Similar to the derivation of (|5.6[) . one obtains 

(7-3) 2|| £ || 2 m , l = rhs 

where "rhs" = DN(<p h ) - DN{f h ) + CN(f h ) - CN(<p h ). This simplifies to 



rhs= / g(t){dW n (t)-dW(t)}, 
Jo 

with W and W n as in Lemma [31 and 

g(t) = \v h (t)-f (t)\ 2 -\f h (t)-f (t)\ 2 

= ( Vh (t) + f h (t)-2f (t))e(t). 
By Lemma |U we get that 

(7-4) rhs < n nh || ip h + f h -2 f Q \\ wmh || e \\ wmhl 

with 

(7.5) Vnh = 0( V(nh)-i{\og(l/h)V loglogn} ) 

almost surely, uniformly in h G Ti-ni"/)- 
Substituting (|7.4[) into (|7.3[) . we obtain 



(7-6) \\e\L m h<?Vnh\\<P h + f h -2fo\\ 



wmh ' 



Now, with regards to bounding \\ip h + f h — 2 f a H^^, the situation is as in 
Section 5, except that the D i = 0, i = 1, 2, . . . , n. Then, almost surely, 



limsup sup h m \\ f h - f a | 

n-^oo heH n (j) 
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(Note that there is still randomness in f h due to the design.) In the same way, one 
obtains that deterministically, 

\\<Ph-fo\Lmh = 0{h m ). 

It then follows from (17.611 that 



limsup SUp Wwmh < QO _ 

«-oo h£H n (-y) h m (nh)- l {\og{l/h) V log log 71} 



Of course, 



2 h m yj (nfc) -1 { log(l//i) V loglogn} < /i 2m + (n/i) _1 { log(l//i) V log log n}, 
so that 

2 II el 



I wmh m/i 



^ + (n^)- 1 {log(l/^)Vloglogn} ftm ^ (^"H logCl//,) V loglogn} ' 
At the loss of a factor /i 1 / 2 , this gives us the required bound on || s jj^. □ 



Lemma 6. Under the assumption (|1.2p . i/iere exists a constant c, such that 

ll^-/ |lac<c/ni/i ro) |lcx>> 

provided f a satisfies (jl.5p . 

Proof. One verifies that <£> ft is the solution to the differential equation 
(7.7) (-h 2 ) m f^+wf = wf on (0,1), 

supplemented with the natural boundary conditions. Now, we assume that the 
regression function f a satisfies f a e W m '°°(0, 1), so certainly, f a S VP™' 2 (0, 1). 
Then, cf. (|2. 13[) . the solution of (|7.7|) is given by 

( Ph( t ) = / \mfc( t, s)w(s)f (s)ds= ( 9l wmh (t , ■), f ) 



L-((0, l),u>) 



and so 



P h (t) = <*-«*(*. •)>/«,) -^ m «^(*.-)»/i m) ) 2 . 

wmh L 2 (0,1) 

with ^ , s ) the m-th derivative of 9$ wmh ( t,s) with respect to s , as in 

Since, ( 9t wroA ( t , • ) , /„ ) wmh = f a (t), and 

the last inequality by the convolution-likeness of h m St^L, the lemma follows. □ 
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8. Uniform error bounds 

As an application of the reproducing-kernel approximation to the spline smoother, 
we obtain uniform error bounds on the spline smoother, uniformly in the bandwidth 
over a wide (useful) range. 

Proof of the Main Theorem. First, let us consider the result of Theorem [TJ Recall 
that the range of the bandwidths is Q n {l) — [l (n^ 1 logn) 1 ~ 2 / A , 1/2] and that 
2 < A < min(«, 4) with n > 3. So, he G n (j) implies that h > logn) 1 / 2 . 
Now, from TheoremEJ with f h = E[ f nh | X u ■ ■ ■ ,X n ], and & nh given by (plgjl . 

f nh (t)-f h (t)=e nh (t)+e nh (t), 

with, almost surely, uniformly in h € 7i n (7), 

||e" ,l || 00 =o(/ l - 1 / 2 (n/ l )- 1 {log(l/^)Vloglogn}). 

For h ^> logn) 1 / 2 , we may conclude that 

II e nh = o( ^/(^-HMl/MVloglogn} ) , 
which is negligible compared to the upperbound of Theorem [2l 

II & nh Hoc = 0( v/^-^loga/MVloglogn} ) 
almost surely, uniformly in /i G TL n {^). Finally, 

||/ n,l -/ ||oo< Alloo + ll/h-Zolloo 

< lie„, i || 00 + ||e"' l || 00 + ||/, l -/ || 00 , 

and Theorem [7| takes care of the last term. □ 

Acknowledgment. We thank David Mason for patiently explaining the results of 
Einmahl and Mason [l2| to us, and for straightening out the required modification 
of their Proposition 2. 

References 

[1] Abramovich, F. and Grinshtein, V. (1999). Derivation of equivalent ker- 
nels for general spline smoothing: a systematic approach. Bernoulli 5 359-379. 
IMR168i"703l 

[2] Adams, R. A. and Fournier, J. J. F. (2003). Sobolev Spaces, 2nd edition. 
Academic Press, Amsterdam. 

[3] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. 
Soc. 68 337-404. I MR005 14371 

[4] Boyce, W. E. and DiPrima, R. C. (1977). Elementary Differential Equa- 
tions and Boundary Value Problems, 3rd edition. John Wiley and Sons, New 
York. IMR0179403I 

[5] Chiang, C, Rice, J. and Wu, C. (2001). Smoothing spline estimation for 
varying coefficient models with repeatedly measured dependent variables. J. 
Amer. Statist. Assoc. 96 605-619. MR1946428 



Smoothing splines 



237 



[6] Cox, D. D. (1984a). Asymptotics of M-type smoothing splines. Ann. Statist. 
11 530-551. [MR0696065I 

[7] Cox, D. D. (1984b). Multivariate smoothing spline functions. SIAM J. Nu- 
mer. Anal. 21 789-813. IMR0749371I 

[8] Deheuvels, P. and Mason, D. M. (2004). General asymptotic confidence 
bands based on kernel-type function estimators. Stat. Inference Stock. Process. 
7 225-277. MR2111291| 

[9] Dolph, C. L. and WOODBURY, M. A. (1952). On the relation between 
Green's functions and covariances of certain stochastic processes and its ap- 
plication to unbiased linear prediction. Trans. Amer. Math. Soc. 72 519-550. 
IMR00502T51 

[10] Eggermont, P. P. B. and LaRiccia, V. N. (2006a). Maximum Penalized 
Likelihood Estimation. Volume II: Regression. Springer- Verlag, New York, in 
preparation. MR18378791 

[11] Eggermont, P. P. B. and LaRiccia, V. N. (2006b). Equivalent kernels 
for smoothing splines. J. Integral Equations Appl. 18 197-225. 

[12] ElNMAHL, U. and Mason, D. M. (2005). Uniform in bandwidth consistency 
of kernel-type function estimators. Ann. Statist. 33 1380-1403. IMR21956391 

[13] Eubank, R. L. (1999). Spline Smoothing and Nonparametric Regression. Mar- 
cel Dekker, New York. 1MR09340161 

[14] Eubank, R. L. and Speckman, P. L. (1993). Confidence bands in nonpara- 
metric regression. J. Amer. Statist. Assoc. 88 1287-1301. IMR12453621 

[15] Hardle, W., Janssen, P. and Serfling, R. (1988). Strong uniform con- 
sistency rates for estimators of conditional functionals. Ann. Statist. 16 1428- 
1449. MR0964932J 

[16] Konakov, V. D. and Piterbarg, V. I. (1984). On the convergence rate of 
maximal deviation distribution for kernel regression estimates. J. Multivariate 
Anal. 15 279-294. IMR0768499I 

[17] Mason, D. M. (2006). Private communication. 

[18] Messer, K. and Goldstein, L. (1993). A new class of kernels for nonpara- 
metric curve estimation, Ann. Statist. 21 179-196. [MR1212172I 

[19] Nolan, D. and Pollard, D. (1987). {/-processes: rates of convergence. Ann. 
Statist. 15 780-799. MR0888439 

[20] Nychka, D. (1995). Splines as local smoothers. Ann. Statist. 23 1175-1197. 
IMR135350T1 

[21] Silverman, B. W. (1984). Spline smoothing: the equivalent variable kernel 
method. Ann. Statist. 12 898-916. MR0751281I 

[22] van de Geer, S. A. (2000). Applications of Empirical Process Theory. Cam- 
bridge University Press. IMR1739079I 

[23] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence 
and Empirical Processes. Springer- Verlag, New York. MR1385671 

[24] Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadel- 
phia. IMR10454421 



